cient Exploration In Reinforcement Learning Sebastian

نویسنده

  • Sebastian B. Thrun
چکیده

Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in nite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper distinguishes between two families of exploration schemes: undirected and directed exploration. While the former family is closely related to random walk exploration, directed exploration techniques memorize exploration-speci c knowledge which is used for guiding the exploration search. In many nite deterministic domains, any learning technique based on undirected exploration is ine cient in terms of learning time, i.e. learning time is expected to scale exponentially with the size of the state space (Whitehead, 1991b). We prove that for all these domains, reinforcement learning using a directed technique can always be performed in polynomial time, demonstrating the important role of exploration in reinforcement learning. (The proof is given for one speci c directed exploration technique named counter-based exploration.) Subsequently, several exploration techniques found in recent reinforcement learning and connectionist adaptive control literature are described. In order to trade o e ciently between exploration and exploitation { a trade-o which characterizes many real-world active learning tasks { combination methods are described which explore and avoid costs simultaneously. This includes a selective attention mechanism, which allows smooth switching between exploration and exploitation. All techniques are evaluated and compared on a discrete reinforcement learning task (robot navigation). The empirical evaluation is followed by an extensive discussion of bene ts and limitations of this work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

To appear in: Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches

Whenever an intelligent agent learns to control an unknown environment, two opposing objectives have to be combined. On the one hand, the environment must be su ciently explored in order to identify a (sub-) optimal controller. For instance, a robot facing an unknown environment has to spend time moving around and acquiring knowledge. On the other hand, the environment must also be exploited du...

متن کامل

Eecient Exploration in Reinforcement Learning

Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in nite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper distinguishes between two families of exploration schemes: undirected and directed exploration. Whil...

متن کامل

Active Reinforcement Learning with Monte-Carlo Tree Search

Active Reinforcement Learning (ARL) is a twist on RL where the agent observes reward information only if it pays a cost. This subtle change makes exploration substantially more challenging. Powerful principles in RL like optimism, Thompson sampling, and random exploration do not help with ARL. We relate ARL in tabular environments to BayesAdaptive MDPs. We provide an ARL algorithm using Monte-C...

متن کامل

Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time

We present a new algorithm, Prioritized Sweeping, for e cient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Di erencing and Qlearning have fast real time performance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous expe...

متن کامل

On Planning and Exploration in Non-discrete Environments

The application of reinforcement learning to control problems has received considerable attention in the last few years And86, Bar89, Sut84]. In general there are two principles to solve reinforcement learning problems: direct and indirect techniques, both having their advantages and disadvantages. We present a system that combines both methods TML91, TML90]. By interaction with an unknown envi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992